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^^Two  different  theories  of  the  cognitive  processes  involved  in  rating 
performance  were  compared  by  Nathan  and  Lord  in  1983.  These  theories 
comprised  Borman's  (1978)  traditional  model  of  dimensional  schemata 
and  Feldman's  (1981)  cognitive  categorization  theory.  To  further  explore 
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INTRODUCTION 


In  numerous  organizations  the  mere  mention  of  the  two  words  "per¬ 
formance  appraisal"  can  start  stomachs  churning,  disagreements  among 
coworkers,  and  create  turmoil  at  all  levels  of  the  organization.  It 
is  not  surprising,  then,  to  find  a  large  amount  of  literature  analyzing 
the  Issues,  problems,  and  processes  of  appraisal.  Perhaps  the  most 
widespread  conclusion  Is  that  appraising  performance  is  anything  but 
easy  (Bemardln  &  Cardy,  1982).  There  are  many  factors  that  influence 
the  success  and  consequences  of  these  evaluations.  For  example, 
special  consideration  should  be  given  to  the  purpose  or  use  of  the 
appraisal  and  the  Instrument  selected  to  conduct  it.  Also,  there  are 
personal  characteristics  (e.g.,  race,  sex,  perceptions,  and  mental 
processes)  that  contribute  to  the  appraisal  process.  This  paper 
focuses  specifically  on  two  theoretical  models  that  address  the  under¬ 
lying  cognitive  processes  of  the  performance  rater.  These  Include  a 
traditional  model  offered  by  Borman  (1978)  and  a  cognitive  categoriza¬ 
tion  model  suggested  by  Feldman  (1981).  The  purpose  of  the  present 
study  is  to  answer  a  question  Initially  proposed  by  Nathan  and  Lord 
(1983):  Which  of  these  models  best  describes  the  cognitive  process 
used  by  raters  to  evaluate  a  target  person?  As  such,  this  project  is 
mostly  a  modification  and  extension  of  the  investigation  conducted  by 
Nathan  and  Lord  in  1983. 


Cascio  (1982)  has  defined  performance  appraisal  as  "the  systematic 
dfescrlptfon  of  Individual  job-relevant  strengths  and  weaknesses" 

(p.  309).  Appraisals  are  conducted  in  organizations  for  numerous 
purposes.  For  example,  they  are  frequently  used  as  a  basis  for  pro¬ 
motion  and  placement,  as  a  criterion  to  validate  selection  devices 
and  training  programs,  or  as  a  basis  for  rewards  and  feedback  (Kane  & 
Lawler,  1979).  The  performance  appraisal  Is  viewed  as  a  function  of 
three  Interacting  systems:  the  organizational  setting  within  which 
the  appraisal  occurs,  the  appraiser's  capabilities  to  process  infor¬ 
mation,  and  the  appraisee's  behavioral  patterns  (Ilgen  &  Feldman, 

1983).  Facets  of  each  system  can  contribute  to  Inaccurate  and  biased 
evaluations.  The  appraisal  of  employee  performance  can  directly  affect 
not  only  the  Individual  being  evaluated,  but  also  the  maintenance  of 
the  organization's  effectiveness  (Latham  &  Wexley,  1980).  For  these 
and  other  reasons,  it  Is  critical  that  performance  evaluations  be  as 
accurate  as  possible.  Unfortunately,  this  procedure  has  been  plagued 
with  many  deficiencies  which  preclude  flawless  appraisals. 

In  general,  there  are  two  types  of  performance  measures— the 
objective  (non judgmental )  and  the  subjective  (judgmental).  Nonjudg- 
mental  data  Include  measures  of  production  output,  errors,  and  task 
completion  times,  as  well  as  records  of  absenteeism,  turnover, 
grievances,  and  accidents  (Landy  &  Farr,  1983).  Performance  in  the 
majority  of  jobs,  however,  is  not  easily  measured  in  objective  terms. 
Generally,  reliance  on  nonjudgmental  measures  will  not  adequately 


Capture  the  essence  of  employees'  performance.  Judgmental  data,  on 
the  other  hand,  allow  for  a  wider  range  of  discretion  and  application. 
Kany  researchers  have  found  that  subjective  measures,  specifically 
rating  scales,  are  used  by  an  overwhelming  majority  of  organizations 
(e.g.,  Blgoness,  1976;  Borman,  1979;  DeNlsf'4  Stevens,  1981).  Indeed, 
ratings  are  the  most  ubiquitous  form  of  performance  appraisal. 


Performance  Ratings 

One  cannot  assume  from  the  wide  use  of  ratings  that  they  are  the 
method  least  susceptible  to  error.  Rather,  they  are  inevitably  con¬ 
taminated  by  a  host  of  problems. 

In  a  comprehensive  review  of  the  literature  on  performance  rating 
Landy  and  Farr  (1980)  asserted  that  three  general  variables  influence 
the  rating  process:  (a)  the  roles  of  the  rater  and  ratee,  (b)  the 
vehicle  or  rating  instrument,  and  (c)  the  rating  context.  Investiga¬ 
tions  Into  each  of  these  factors  have  differentially  contributed  to 
our  understanding  of  the  appraisal  process. 

The  vehicle,  or  rating  format,  has  received  much  attention  in 
research  and  literature.  The  basic  assumption  underlying  these 
studies  Is  that  the  type  of  format  chosen  may  affect  the  accuracy  and 
adequacy  of  evaluations.  As  several  authors  have  suggested,  the  con¬ 
clusions  reached  have  been  less  than  enlightening  in  finding  the 
superior  vehicle  (e.g.,  Bernardin  &  Cardy,  1982;  DeNisi  &  Stevens, 
1981). 

For  example,  in  1979,  Borman  studied  the  effects  of  rating  format 
(and  rater  training)  on  accuracy  in  performance  ratings.  Selecting 


five  formats  he  believed  to  be  most  promising,  student  subjects  were 

i 

asked  to  evaluate  the  job  performance  effectiveness  of  a  manager  or 
recruiter.  The  results  of  a  Job  X  Format  Interaction  indicated  no 
superiority  of  one  format  over  any  other. 

The  bleak  conclusion  reached  by  many  researchers  has  been  that 
"after  more  than  30  years  of  serious  research,  it  seems  that  little 
progress  has  been  made  In  developing  an  efficient  and  psychometrically 
sound  alternative  to  the  traditional  graphic  rating  scale"  (Landy  & 
Farr,  1980,  p.  89).  It  appears  that  the  rating  format,  itself,  is 
less  Important  than  the  cognitive  effect  it  has  on  the  rater. 

As  If  this  conclusion  is  not  disheartening  enough,  research  into 
the  second  component  of  the  rating  process  (namely,  the  rating  con¬ 
text)  has  not  fared  much  better.  However,  unlike  the  rating  format, 
one  of  the  central  problems  associated  with  rating  context  Is  a  rela¬ 
tive  lack  of  research.  Included  in  this  area  are  studies  investigating 
the  effects  of  the  intended  use  of  performance  evaluations  as  well 
as  position  characteristics.  At  least  one  major  conclusion  can  be 
drawn.  It  appears  that  ratings  for  administrative  purposes  tend  to 
be  more  lenient  and  less  accurate  than  those  obtained  for  research 
purposes  or  employee  development  (DeCotiis  4  Petit,  1978;  Warmke  4 
Billings,  1979;  Zedeck  4  Cascio,  1982).  Unfortunately,  most  investi¬ 
gations  in  this  area  have  been  conducted  for  research  purposes  and 
therefore  do  not  provide  decisive  conclusions  about  the  impact  of 
rating  purposes  within  the  organization.  Rating  variances  do  not 
appear  to  be  highly  dependent  on  the  purpose  component,  however,  more 
efficacious  tests  are  needed  (Landy  4  Farr,  1980). 
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Lest  the  reader  give  up  hope  on  the  rating  process,  attention 
will  now  be  turned  to  the  third  component— roles— which  has  enjoyed 
much  attention  and  greater  success  In  reaching  beneficial  conclusions. 
This  variable  can  be  divided  Into  three  constituents:  ratee  charac¬ 
teristics,  rater  characteristics,  and  an  Interaction  of  the  two.  A 
comprehensive  review  of  these  are  beyond  the  scope  of  this  paper, 
however,  a  brief  overview  follows.  For  further  information,  readers 
are  directed  to  the  literature  by  Cascio  (1982),  Dunnette  and  Borman 
(1979),  and  Landy  and  Farr  (1983). 

Ratee  characteristics  Include  personal  factors  such  as  race,  age, 
and  sex;  and  job-related  variables  such  as  performance  level,  tenure, 
and  reaction  to  performance  appraisal.  Rater  characteristics  again 
Include  race,  age,  and  sex,  but  also  embrace  Intellectual  skills, 
job  experience,  knowledge  of  the  rater  and  the  job,  as  well  as  numerous 
other  factors. 

To  many  laypersons  It  may  appear  that  ratings  should  be  more  a 
function  of  the  ratee's  real  performance  than  of  the  characteristics 
of  the  one  doing  the  rating.  However,  as  Bernardln  and  Cardy  (1982) 
have  stated,  "there  is  a  strong  Indication  that  ratings  are  as  much 
or  more  a  function  of  the  idlosyncracies  of  the  rater  who  made  them 
than  they  are  of  the  actual  behavior  of  the  ratees"  (p.  352). 

Of  particular  interest  to  the  present  project  is  the  cognitive 
characteristics  of  the  rater.  Recently,  much  attention  has  been 
devoted  to  this  area,  and  rightfully  so.  The  cognitive  characteris¬ 
tics  of  raters  seem  to  be  the  key  to  significantly  increasing  our 
understanding  of  the  rating  process.  Both  industrial  and  social  psy- 


■ 


chologlsts  have  explored  the  mental  processes  underlying  our  evalua¬ 
tions  of  others. 

This  current  surge  of  interest  in  a  cognitive  approach  to  perfor¬ 
mance  apparl sal  Is  evidenced  in  a  model  proposed  recently  by  DeNisi, 
Cafferty,  and  Meglino  (1984).  Their  model  consists  of  a  series  of 
Interrelated  steps  reflecting  the  notion  that  the  appraisal  process 
is  a  judgmental  activity  dependent  on  social  perception  and  cognition. 
This  model  has  a  unique  perspective  in  that  it  views  the  rater  as 
actively  seeking  the  information  required  to  formally  evaluate  per¬ 
formance.  Accordingly,  the  steps  Involved  in  appraisal  include: 

(a)  observing  employee  behavior,  (b)  cognitively  representing  that 
behavior,  (c)  storing  the  representation  in  memory,  (d)  retrieving  the 
information  required  to  make  a  formal  evaluation,  (e)  examining  and 
integrating  additional  pieces  of  information  with  the  retrieved  data, 
and  (f)  formally  appraising  the  employee  with  the  use  of  a  rating 
Instrument. 

In  addition  to  presenting  their  model,  DeNisi  et  al.  (1984) 
suggest  several  research  propositions.  As  a  part  of  this,  they  call 
for  an  investigation  to  determine  whether  raters  are  more  likely  to 
recall  specific  ratee  behaviors  or  only  overall  impressions  when 
appraising  performance.  There  are  theories  which  address  this  issue 
either  indirectly  or  directly.  This  proposal  focuses  on  two  such 
theories  of  cognitive  processing  which  have  recently  been  compared  by 
Nathan  and  Lord  (1983).  It  is  believed  that  through  determination 
of  which  model  more  appropriately  fits  the  rater's  cognitive  process, 
accurate  ratings  will  be  a  goal  less  distant. 


Cognitive  Categorization  Model 

Feldman  (1981)  developed  a  categorization  model  in  order  to 
explain  the  appraisal  process,  which  he  viewed  as  a  more  specific 
case  of  general  cognitive  processes.  This  model  is  therefore  steeped 
in  cognitive-social  psychology.  It  is  also  an  outcome  of  research 
conducted  by  Rosch  and  her  coworkers  (Rosch,  1978;  Rosch,  Mervls, 

Gray,  Johnson,  &  Boyes-Braem,  1976)  and  related  explorations  by 
Cantor  and  Mlschel  (1977,  1979). 

Schemas .  To  oversimplify,  the  cognitive  categorization  model 
asserts  that  people  classify  and  organize  Information  about  others 
into  general  categories.  This  allows  us  to  integrate,  discriminate, 
and  simplify  an  enormous  amount  of  information  by  attributing  stability 
to  another's  behavior  (Snyder  &  Uranowitz,  1978).  In  short,  it  makes 
life  easier  to  contend  with  a  neat  little  package  of  who  "Mrs.  Smith" 

Is  rather  than  sift  through  and  store  all  the  different  pieces  of 
information  about  her. 

A  schema  is  a  cognitive  structure  that  consists  in 
part  of  the  representation  of  some  defined  stimulus 
domain,  including  a  specification  of  the  relation¬ 
ships  among  its  attributes,  as  well  as  specific 
examples  or  instances  of  the  stimulus  domain.  As 
such,  one  of  the  chief  functions  of  a  schema  is  to 
provide  an  answer  to  the  question,  "what  is  it?" 

(Taylor  &  Crocker,  1981,  p.  91). 


A  schema,  therefore,  allows  us  to  compartmentalize  people,  and  if 
asked  about  them  we  can  activate  the  category  we  have  placed  them  in 
rather  than  sort  through  all  of  our  specific  interactions  with  them. 


Furthermore,  a  category  prototype  Is  developed  against  which  judgments 
of  similarity  can  be  made.  The  prototype  is  an  abstraction  of  the 
most  representative  and  inherent  features  of  a  schema.  For  example, 
if  you  know  that  Mrs.  Smith  can  prepare  great  pies,  personally  main¬ 
tains  a  tidy  home,  has  raised  three  children,  and  needle  points  her 
sofa  pillows,  you  may  categorize  Mrs.  Smith  into  your  "traditional 
homemaker"  schema  because  she  exhibits  behaviors  that  are  proto- 
typically  associated  with  this  category.  Furthermore,  upon  remembering 
Mrs.  Smith,  you  may  also  attribute  to  her  an  ability  to  cook  pot 
roast,  mend  clothes,  and  tell  bedtime  stories,  not  because  you  have 
observed  her  doing  them,  but  rather  because  they  are  additional  be¬ 
haviors  categorized  under  your  "homemaker"  schema. 

Judgmental  applications.  Once  an  evaluation  has  been  made  it 
is  subsequently  used  as  a  basis  for  later  inferences  about  the  person 
(Srull  &  Wyer,  Jr.,  1979)  and,  as  many  researchers  have  found,  it  is 
extremely  difficult  to  change  these  initial  judgments  (e.g..  Cantor  & 
Mlschel,  1979;  Lord,  Ross,  4  Lepper,  1979).  In  fact,  commitment  to 
our  incipient  categorization  of  others  is  so  prevalent  and  powerful 
that  it  may  color  any  future  inconsistencies  in  their  behavior.  Once 
an  employee  is  categorized,  further  judgments  of  that  employee  are 
influenced  by  the  category  prototype.  This  process  may  produce  under¬ 
evaluations  and/or  overevaluations  of  the  employee  (Feldman,  1981). 

Ostrom,  Lingle,  Pryor,  and  Geva  (1980)  discuss  a  number  of  their 
studies  concerned  with  the  cognitive  categorization  of  person  impres¬ 
sions.  One  experiment  (Lingle  &  Ostrom,  1979)  focused  on  the  influence 
an  initial  judgment  has  on  the  ease  of  making  a  subsequent  judgment. 


Participants  were  asked  to  judge  the  appropriateness  of  a  stimulus 
person  for  a  designated  vocation.  Decisions  were  based  on  a  set  of 
stimulus  traits  describing  the  target  person  and  were  presented  during 
this  decision-making  process.  The  participants  were  then  asked  to 
make  another  decision  regarding  the  suitability  of  the  target  person 
for  an  occupation  requiring  similar  or  dissimilar  traits  to  that  of 
the  first  occupation.  Furthermore,  the  descriptive  traits  were 
removed  during  this  decision-making  process  so  that  the  judgment  was 
based  on  memory.  The  speed  with  which  subjects  made  this  second 
judgment  served  as  the  dependent  variable.  Results  indicated  that 
subjects  required  more  than  a  second  longer  to  make  the  occupational 
judgment  Involving  dissimilar  traits  as  opposed  to  those  requiring 
similar  traits.  This  finding  suggests  that  an  initial  thematic 
decision  has  an  influential  Impact  on  subsequent  judgments. 

In  a  related  study,  Lingle  (1979)  asked  subjects  to  judge  whether 
a  specified  trait  would  characterize  a  target  person  that  had  been 
already  characterized  by  two  other  traits.  Of  these  two  traits,  one 
trait  was  relevant  and  the  other  irrelevant  to  making  the  judgment. 
After  making  their  decision,  participants  performed  a  distractor  task 
for  50  seconds.  They  were  then  asked  to  make  an  occupational  judgment 
based  on  memory  of  the  stimulus  person.  While  contemplating  their 
decision,  subjects  were  interrupted  by  a  probe  word  that  was  nearly 
Illegible.  This  word  was  either  one  of  the  three  descriptive  trait 
words  used  In  the  earlier  stages  of  the  experiment  or  a  control  trait 
that  had  not  been  associated  with  the  target  person.  A  faster  recogni 
tion  speed  was  established  for  probe  words  previously  associated  with 


the  target  as  opposed  to  unassociated  probes.  Furthermore,  the  recog¬ 
nition  times  revealed  that  subjects  accessed  relevant  stimulus  traits 
more  readily  than  Irrelevant  traits  during  the  judgment  Interval. 

8ased  on  these  results,  the  authors  concluded  that  Initial  judg¬ 
ments  guide  and  determine  later  evaluation.  "Most  striking  is  the 
persistent  evidence  that  an  initial  judgment,  rather  than  factual 
stimulus  Information,  is  remembered  and  used  as  the  basis  for  sub¬ 
sequent  judgments"  (Ostrom,  Lingle,  Pryor,  &  Geva,  1980,  p.  84). 

This  suggests  that  people  rely  on  earlier  categorization,  instead  of 
specific  pieces  of  Information  when  appraising  others.  In  effect, 
categorization  not  only  colors  and  biases  recall,  it  also  precludes 
contradictory  evidence  from  surfacing  and,  in  fact,  elicits  confirming 
evidence.  Thus,  testing  an  Impression  necessitates  a  search  for 
supporting  evidence  and,  consequently  makes  disconfirmation  of  the 
impression  more  Improbable  (Feldman,  1981). 

Kulik  (1983)  further  supports  the  argument  that  initial  beliefs 
about  others  persevere  even  In  the  face  of  contradictory  behavior. 

In  Kulik1 s  study,  subjects  first  watched  a  videotape  of  two  people 
getting  acquainted,  one  of  which  was  a  target  person.  During  this 
initial  exposure,  subjects  developed  either  extroverted  or  Introverted 
schematic  Impressions  of  the  target.  Next,  a  second  videotape  was 
viewed  which  confirmed  or  disconfirmed  the  initial  schematic  impres¬ 
sion.  Even  when  disaffirming  behavior  was  viewed,  initial  images 
were  maintained.  Kulik  (1983)  concludes  that  our  beliefs  about  others 
are  not  likely  to  change  "as  a  simple  function  of  impartially  tallying 
each  Instance  of  consistent  and  inconsistent  behavior"  (p.  1978); 


instead,  persistence  of  schematic  Impressions  seems  to  be  the  rule 
rather  than  the  exception. 

Foti,  Fraser,  and  Lord  (1982)  have  demonstrated  the  significance 
of  categorization  in  perceptions  of  political  leaders.  The  basic 
assumption  underlying  their  research  was  that  the  perception  of  a 
leader  is  developed  by  comparing  the  person  being  judged  with  leader 
prototypes.  Once  categorized,  further  evaluations  will  be  founded 
on  the  category  prototype  rather  than  actual  behaviors.  In  part, 

Foti  et  al.  (1982)  concluded  that  prototypes  operate  much  like  stereo* 
types  In  that  they  specify  characteristics  associated  with  category 
members. 

Halo.  One  by-product  of  cognitive  categorization  is  a  phenomenon 
designated  hy  Thorndike  (1920)  as  the  halo  effect.  In  the  rating 
process,  halo  refers  to  a  tendency  to  attend  to  global  impressions  of 
the  ratee  rather  than  to  differentiating  levels  of  job  behavior. 

Cooper  (1981)  distinguishes  between  two  types  of  halo;  true  and 
Illusory.  True  halo  refers  to  the  degree  of  co-occurrence  actually 
arising  in  ra tees'  skills  or  covariance  between  rating  categories 
(Fox,  Bizman,  &  Herrmann,  1983).  Illusory  halo  exists  when  observed 
halo  surpasses  true  halo.  The  rater  perceives  a  degree  of  covariance 
or  co-occurrence  not  actually  reflected  among  ratees  or  the  dimensions. 
Investigation  interests  focus  on  the  latter  type  of  halo  in  lieu  of 
the  former. 

Schemas  can,  and  do  incorporate  both  types  of  halo.  For  example, 
our  homemaker  schema  may  contain  an  element  delineating  the  female 
homemaker  as  a  good  seamstress  and  also  contain  an  element  depicting 
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the  homemaker  as  a  charming  social  hostess.  However,  any  one  specific 
woman  that  Is  categorized  under  the  homemaker  schema,  may  or  may  not 
possess  these  qualities.  If  In  fact  she  is  socially  inept  but  we  are 
so  dazzled  by  the  radiance  of  her  other  homemaker  attributes  we  will 
tend  to  cast  a  halo  around  all  schema -re la ted  qualities.  Therefore, 
a  part  of  our  perception  Is  based  on  "true"  halo  in  that  she  may 
actually  be  an  excellent  seamstress  and  a  good  cook.  However,  our 
perception  is  also  influenced  by  "illusory"  halo  such  that  we  attri¬ 
bute  to  her  an  ability  to  entertain  when,  in  reality,  she  does  not 
possess  this  ability. 

Halo,  in  effect,  is  an  outgrowth  of  schema  processing.  There  is 
a  tendency  to  overestimate  the  Information  that  is  consistent  with  a 
particular  schema  and  simultaneously  underestimate  the  evidence  that 
Is  Inconsistent  or  irrelevant  (Taylor  &  Crocker,  1981). 

Halo  can  be  particularly  problematic  for  performance  ratings  in 
that  raters  may  rely  on  their  schemas  to  determine  the  ways  in  which 
rating  categories  covary.  Furthermore,  unless  disaffirming  evidence 
is  salient  and  acted  on,  ratings  will  covary  among  putatively  related 
categories  (Cooper,  1981). 

As  part  of  a  longitudinal  study  spanning  five  successive  rating 
periods  and  three  and  one-half  years,  the  source  and  stability  of 
halo  was  examined  by  Vance,  Winne,  and  Wright  (1983).  Data  were  col¬ 
lected  in  a  metropolitan  police  department  as  part  of  an  ongoing 
performance  appraisal  program.  Results  indicated  that  reliable  halo 
variance  stemmed  from  the  behavior  of  raters  rather  than  from  ratees. 
The  authors  report  that  this  finding  underscores  the  importance  of 
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the  rater  In  the  performance  appraisal  process. 

Much  research  has  been  done  to  try  to  eliminate  or  reduce  the 
effects  of  halo  In  ratings  (e.g.,  Bemardln  &  Pence.  1980;  Cooper, 

1983;  Johnson,  1963;  Johnson  &  Vidullch,  1956;  Kenny  &  Berman,  1980). 
Some  research  findings  have  been  more  successful  than  others,  but 
unfortunately,  none  have  been  definitive.  Attempts  have  been  made  to 
statistically  remove  the  effects  of  halo  in  ratings.  However,  this 
technique  does  not  differentiate  true  from  illusory  halo,  and  ideally, 
only  Illusory  halo  should  be  removed. 

Traditional  Model 

Although  there  has  been  an  enormous  amount  of  research  related 
to  the  cognitive  categorization  model,  relatively  little  has  been  con¬ 
ducted  to  explore  the  traditional  model  as  outlined  by  Borman  (1978). 

Overview.  Briefly  stated,  this  model  asserts  that  raters  can, 
and  do,  differentiate  behaviors  into  distinct  separate  dimensions  so 
that  they  are  able  to  dlscrimlnately  analyze  observations.  Borman 
(1978)  views  performance  evaluation  as  a  three-stage  process:  "(a) 
observing  work-related  behavior,  (b)  evaluating  each  of  these  behaviors, 
and  (c)  weighting  these  evaluations  to  arrive  at  a  single  rating  on 
a  performance  dimension''  (p.  141).  Embedded  in  this  process  Is  an 
ability  to  remember  specific  ratee  behaviors. 

Borman  (1974)  alleges  that  Individual  differences  in  rating 
accuracy  occur,  mostly,  due  to  raters'  divergent  perspectives.  As 
an  outgrowth,  Borman  suggests  that  a  hybrid  multi -trait-multi -rater 
analysis  be  used.  This  allows  raters  to  evaluate  only  those  dimen- 


slons  that  they  are  In  a  good  position  to  rate  (Casclo,  1982).  Across 
all  observers  of  an  employee's  performance,  more  accurate  and  complete 
evaluations  should  be  obtained.  One  Implication  Is  that  raters 
observing  Identical  behaviors,  perhaps  those  In  the  same  organiza¬ 
tional  level,  should  be  able  to  provide  more  consistent  ratings  than 
those  raters  observing  separate  behaviors. 

Borman  proposed  that  raters  somehow  combine  the  evalua¬ 
tions  of  behaviors  Into  independent  performance  ratings 
on  multiple  dimensions.  Such  a  model  of  the  rating 
process  Implies  that  raters  can  store  Information  In 
Independent  dimensional  schemata  and  then  retrieve 
this  dimensionally  Independent  Information  when  making 
performance  evaluations  (Nathan  &  Lord,  1983,  p.  103). 

This,  then.  Is  a  brief  synopsis  of  two  theories  of  cognitive 
processes  Involved  In  performance  appraisal.  This  is  not  to  suggest 
that  they  are  the  only  models  available,  however,  they  do  provide  a 
substantial  foundation  to  further  explore  the  mental  processes  under¬ 
lying  ratings  of  others. 

Performance  Patterns 

What  If  the  behavior  observed  is  variable,  as  would  be  realis¬ 
tically  expected  in  organizations?  Relatively  few  studies  have 
addressed  this  issue.  Those  that  have,  have  produced  inconsistent 
results.  The  pattern  of  performance  exhibited  by  an  employee  may 
have  varying  effects  on  the  ratings  they  subsequently  receive. 

In  a  series  of  experiments,  Jones,  Rock,  Shaver,  Goethals,  and 
Ward  (1968)  examined  the  effect  that  patterns  of  performance  had  on 


the  ascription  of  ability.  They  found  that  descending  performers 
were  attributed  with  greater  intelligence  and  were  expected  to  out¬ 
perform  their  ascending  and  random  order  counterparts.  It  appears 
that  performers  who  excel  at  first  are  viewed  as  more  capable  than 
their  "late  bloomer"  counterparts.  Furthermore,  this  finding  is 
robust  and  replicable.  Jones  et  al.  (1968)  explain  that  early  infor¬ 
mation  about  ability  Is  heavily  weighted  and  leads  to  premature 
and  perservering  ascriptions.  This  seems  to  be  consistent  with  the 
cognitive  categorization  model  In  that  early  schematic  impressions 
are  maintained  regardless  of  subsequent  patterns  of  performance. 
However,  other  research  findings  are  contradictory. 

DeNIsl  and  Stevens  (1981)  challenged  the  pattern  effects  by 
presenting  subjects  with  varying  sales  figures  of  a  manager.  One 
hypothesis  predicted  that  the  manager  presented  as  an  ascending, 
rather  than  descending,  performer  would  receive  more  favorable  evalua 
tlons.  Results  supported  this  prediction  and  are  In  direct  conflict 
with  the  conclusion  of  Jones  et  al.  (1968). 

As  previously  mentioned,  Nathan  and  Lord  (1983)  pitted  the  cate¬ 
gorization  model  against  the  traditional  model  In  a  study  exploring 
halo  In  performance  ratings.  They  found  the  traditional  model  was 
generally  more  appropriate  for  explaining  the  cognitive  processes  of 
performance  ratings.  However,  they  also  found  halo  effects  and  evi¬ 
dence  of  other  errors  that  were  unexplainable  by  Borman's  (1978) 
theory  but  were  consistent  with  cognitive  categorization. 

The  present  study  further  examined  the  roles  of  these  models  in 
the  performance  appraisal  process.  Innumerable  researchers  have 


called  attention  to  the  need  for  further  Investigations  In  this  area 
(e.g.,  Borman.  1978.  1979;  Casclo.  1982;  Nathan  &  Lord,  1983).  This 
project  was  exploratory  In  nature,  but  should  serve  to  enhance  our 
knowledge  of  the  evaluation  process. 

Summary  and  Hypotheses 

Performance  appraisals  can  have  a  powerful  Influence  on  the  indl 
vidua 1  evaluated  as  well  as  on  organizational  effectiveness.  To  con¬ 
duct  evaluations  of  their  employees,  organizations  most  frequently 
turn  to  the  rating  scale.  This  format,  however.  Is  plagued  with  in¬ 
herent  problems  that  create  biased  results. 

Accuracy  In  the  appraisal  process  is  vital  but,  unfortunately, 
often  lacking.  Research  has  centered  on  improving  this  state-of- 
affairs  by  examining  the  affect  of  three  primary  variables:  (a)  the 
appraisal  Instrument,  (b)  the  context  within  which  the  appraisal 
occurs,  and  (c)  characteristics  of  the  ratee  and  rater.  Investiga¬ 
tions  concerned  with  the  first  two  variables  have  not  enlightened  our 
knowledge  of  appraisal  accuracy  nearly  as  much  as  the  more  recent 
focus  on  ratee/rater  characteristics.  As  part  of  this  focus,  there 
have  been  Inspections  of  the  cognitive  characteristics  of  raters. 

One  result  has  been  the  development  of  two  theoretical  models; 
Feldman's  cognitive  categorization  model  and  Borman's  traditional 
model. 


Feldman's  model  asserts  that  raters  rely  on  schemas  to  judge  the 
ratee's  performance.  Once  a  judgment  is  made  it  is  likely  to  persist 
even  in  the  face  of  contradictory  evidence.  As  a  by-product  of  cogni 


tlve  categorization,  halo  permeates  the  rating  process,  thereby  pro¬ 
ducing  evaluations  that  do  not  differentiate  among  performance  dimen¬ 
sions.  In  other  words,  raters  perceive  similarity  among  the  components 
of  performance  and  therefore  tend  to  rate  all  of  them  favorably  or 
unfavorably.  Based  on  the  ra tee's  Initial  behaviors,  he/she  will  be 
classified  Into  a  general  category,  or  schema,  and  ratings  will 
reflect  prototypical  behavior  associated  with  the  early  category 
assignment  rather  than  based  on  specific  observations.  This  will  be 
evident  not  only  within,  but  also  across  the  rated  performance  dimen¬ 
sions. 

Cognitive  categorization  contradicts  Borman's  traditional  model 
In  that  Borman  avers  that  raters  evaluate  behaviors  independently. 
Reliance  on  global  impressions  Is  foregone.  Instead,  the  specific 
performance  behaviors  are  retrieved  from  memory  and  then  averaged  to 
arrive  at  a  dimensional  rating.  Therefore,  when  asked  to  make  a 
single  rating  on  a  component  of  performance,  the  traditional  model 
assumes  raters  will  average  (through  a  weighting  process)  the  observa¬ 
tions  within  a  dimension  and  subsequently  rate  performance  around  the 
mean  of  the  performance  pattern. 

It  Is  unlikely  that  employees  will  exhibit  a  pattern  of  perfor¬ 
mance  that  is  consistent  and  invariable  over  time.  For  the  appraisal 
process,  this  reality  makes  the  rater's  job  more  difficult.  Assuming 
that  the  appraiser  first  observes  behavior  that  is  mostly  favorable 
and  later  observes  a  primarily  poor  performance  demonstration,  what 
Impact  will  this  have  on  the  subsequent  ratings?  If  ratings  are 
completed  on  unique  dimensions  of  performance,  the  traditional  model 
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predicts  that  raters  will  store  observed  behaviors  under  each  dimen¬ 
sion  and  average  the  performance.  Therefore,  ratings  on  each  dimen¬ 
sion  will  gravitate  toward  the  mean  of  presented  behaviors.  Dimen¬ 
sional  ratings  will  depend  on  the  number  and  extremity  of  the  behavior 
exhibited.  On  the  other  hand,  Feldman's  categorization  model  pre¬ 
dicts  that  the  ratings  will  reflect  good  performance  because  raters 
were  Initially  exposed  to  positive  behaviors.  Consequently,  they 
would  classify  the  ratee  as  a  good  performer.  One  Important  Implica¬ 
tion  of  this  model  Is  that  ratings,  both  within  and  across  dimensions, 
will  be  favorable  as  long  as  the  overall,  Initial  performance  Is 
favorable.  The  predicted  ratings  would  change  only  for  Feldman's 
model  If  the  pattern  of  observed  performance  was  reversed.  That  Is, 

If  raters  view  primarily  unfavorable  behaviors  Initially  and  later 
see  primarily  favorable  behaviors,  Feldman’s  model  predicts  that  the 
ratings  will  reflect  the  initial  poor  performance,  whereas  Borman's 
predictions  remain  unchanged. 

To  be  more  specific,  this  study  investigated  the  following 
research  question  and  hypotheses: 

Research  question.  Which  theoretical  model,  cognitive  categoriza¬ 
tion  or  the  traditional  model,  will  more  adequately  explain  the  rater's 
cognitive  processes  when  conducting  a  performance  appraisal? 

Assuming  the  superiority  of  the  cognitive  categorization  model 
led  to  the  following  hypothesis: 

Hypothesis  la.  Performance  will  be  rated  favorably  if  raters 
are  presented  first  with  predominantly  good  ratee  behaviors  and  later 
with  mostly  poor  performance  behaviors.  This  will  be  evident  both 


within  and  across  performance  dimensions.  Conversely,  if  presented 
first  with  predominantly  poor  behaviors  and  later  with  good  behaviors, 
raters  will  evaluate  performance  as  unfavorable. 

On  the  other  hand,  if  Borman's  traditional  model  was  viewed  as 
more  meritorious  then  a  different  hypothesis  was  necessary: 

Hypothesis  lb.  Across  observed  performances,  raters  will  average 
behavior  variations  and  the  dimensional  ratings  will  therefore  reflect 
the  mean  exhibited  behavior. 


This  research  project  Involved  three  phases:  (1)  the  careful 
construction  of  two  lecture  scripts  and  videotapes,  (2)  the  pretesting 
of  the  videotapes,  and  (3)  the  actual  performance  appraisal  experiment 


The  first  phase  closely  emulated  the  ideas  and  procedures  used  by 
Nathan  and  Lord  (1983). 

Construction  of  Videotapes 

Procedure.  Two  videotapes  were  developed,  both  of  them  lasting 
approximately  13  minutes.  For  each  tape,  a  different  script  was 
created.  The  same  person  enacted  the  role  of  a  college  instructor  In 
both  tapes.  The  lecture  material  was  selected  from  the  general  area 
of  communication.  More  specifically,  the  Instructor  discussed  some 
of  the  processes  of  communication  In  organizations.  Each  tape,  how¬ 
ever,  covered  different  aspects  of  this  subject.  Neither  tape  dis¬ 
cussed  the  same  or  similar  topics.  Within  each  videotape  the  lecturer 
displayed  four  behavioral  Incidents  in  each  of  four  performance  dimen¬ 
sions.  Based  on  a  study  by  Harari  and  Zedeck  (1973)  these  four  dimen¬ 
sions  comprised: 

1.  Organization:  The  lecturer's  arrangement  of  the  lecture 
material;  the  extent  to  which  the  lecturer  led  the  class 
through  a  logical  and  orderly  sequence  of  material. 

2.  Pell  very:  The  lecturer's  manner  of  conveying  the  lecture 
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material  and  the  extent  to  which  she  used  the  blackboard  or 
audiovisual  aids  to  clarify  and  emphasize  important  points 
of  her  presentation;  speaking  and  writing  abilities  as  well 
as  physical  mannerisms  are  also  included. 

3.  Relevance:  The  lecturer's  choice  of  examples  used  in  con¬ 
veying  information;  examples  which  were  important  and  mean¬ 
ingful  to  the  audience. 

4.  Depth  of  knowledge:  The  lecturer's  mastery  of  the  subject 
matter;  this  includes  how  well  she  knew  the  literature  and 
the  research  she  reported. 

One  lecture  script  exemplified  the  instructor  exhibiting  pre¬ 
dominantly  favorable  teacher  behavior  across  all  four  performance 
dimensions  while  the  second  script  displayed  primarily  unfavorable 
teacher  behavior  across  the  dimensions.  Table  1  Indicates  the 
specific  number  and  type  of  behavioral  incidents  involved  in  each  of 
the  tapes.  A  variation  of  good  and  bad  incidents  was  used  in  each 
videotape,  rather  than  tapes  composed  of  all  good  or  all  bad  behaviors 
because  this  appeared  to  be  a  more  realistic  representation  of  per¬ 
formance  observed  in  the  real  world  (Borman,  1978). 

Development  of  Lecture  Scripts 

As  a  prerequisite  to  videotape  construction,  it  was  imperative 
to  develop  specific  behavioral  incidents  that  would  be  incorporated 
into  the  lecture  scripts. 

Subjects.  Forty  undergraduate  students  at  Texas  A&M  University 
voluntarily  completed  a  brief  questionnaire. 


Table  1 


Number  and  Kind  (Good/Poor)  of  Behavioral  Incidents  in  Each 


erformance  Dimension  of  Each  Videotape 


Favorable  Tape 


Unfavorable  Tape 


Incidents 


Incidents 


Performance  Dimension 


Good  Poor 


Good  Poor 


Delivery 

Organization 

Relevance 


Depth  of  Knowledge 


Note.  Adapted  from  "Cognitive  Categorization  and  Dimensional 
Schemata:  A  Process  Approach  to  the  Study  of  Halo  in  Performance 
Ratings"  by  B.  R.  Nathan  and  R.  G.  Lord,  1983,  Journal  of  Applied 
Psychology,  68,  p.  106.  Copyright  1983  by  the  American  Psycho¬ 
logical  Association.  Adapted  by  permission. 


Procedure.  Fifty  specific  examples  characterizing  both  favorable 
and  unfavorable  behaviors  from  the  four  performance  dimensions  were 
developed  by  the  authors.  The  following  statement  exemplified  a 
favorable  behavior  In  the  organizational  dimension:  "At  the  end  of 
the  lecture*  the  Instructor  summarized  what  was  discussed".  A  bad 
behavioral  Incident  In  the  depth  of  knowledge  dimension  was  depicted 
by:  "The  lecturer  failed  to  remember  the  results  of  a  research 
study". 

Subjects*  provided  with  a  complete  set  of  definitions  and 
descriptions  of  the  performance  dimensions,  were  asked  to  review  and 
critique  the  behavioral  Incidents.  They  were  requested  to  (a)  assign 
each  Incident  to  the  dimension  they  believed  It  represented,  and  (b) 
Indicate  whether  they  felt  It  was  an  example  of  poor,  average,  or 
excellent  behavior  through  the  use  of  a  seven-point  Likert  scale. 

The  two  lecture  scripts  were  constructed  from  the  behavioral  inci¬ 
dents  found  to  be  most  indicative  of  good  or  bad  performance  within 
each  dimension.  To  be  considered  acceptable  for  Inclusion  in  the 
scripts,  an  Item  reflecting  good  (poor)  instructor  behavior  had  to 
be  rated  a  six  or  seven  (one  or  two)  by  more  than  50%  of  the  respon¬ 
dents.  Additionally,  more  than  50%  of  the  respondents  needed  to 
agree  on  their  assignment  of  the  Item  to  a  particular  dimension. 

Videotape  Pretesting 

Subjects.  Thirty  undergraduate  students  (14  males,  16  females) 


participated  In  the  pretesting  to  fulfill  an  Introductory  psychology 
course  requirement.  Students  involved  in  the  development  of  the 
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lecture  scripts  could  not  participate  in  this  phase  of  the  experiment. 

Procedure.  The  two  videotaped  lectures  were  shown  to  all  par¬ 
ticipants.  After  each  tape,  three  7-point  Likert  rating  scales  (with 
extreme  anchors  of  “very  poor"  and  "very  good")  were  provided  and 
subjects  were  requested  to  rate  the  instructor's  performance  in  each 
separate  tape.  Results  indicated  that  the  tape  designed  to  exhibit 
predominantly  favorable  behaviors  received  a  mean  rating  of  4.7  (cor¬ 
responding  to  the  scale  anchor  of  "somewhat  good")  while  the  tape 
with  primarily  unfavorable  behaviors  received  a  mean  rating  of  1.8 
(corresponding  to  the  scale  anchor  of  "mostly  poor").  This  appeared 
to  be  an  adequate  differentiation  of  the  two  tapes  to  permit  their 
use  in  the  next  phase  of  the  experiment,  £(58)  *  12.29,  £  <  .001. 


Performance  Appraisal  Experiment 


Subjects.  Fifty- four  undergraduate  students  (30  males,  24 
females)  participated  in  exchange  for  course  credit.  They  were  ran¬ 
domly  assigned  to  one  of  two  experimental  conditions.  The  average 
age  of  participants  was  19.5  years;  63%  were  freshmen;  and  39%  were 
in  a  business  related  field  of  study.  Subjects  involved  in  either 
the  earlier  development  of  the  lecture  scripts  or  the  pretesting  of 
the  videotapes  could  not  participate  in  this  experiment. 

Procedure.  At  the  onset  of  the  study  subjects  were  given  a 
complete  explanation  of  each  performance  dimension  and  were  also 
informed  that  they* would  later  appraise  the  performance  of  a  college 
instructor  based  on  these  dimensions.  Nathan  and  Lord  (1983)  provided 
the  rationale  for  following  these  procedures: 
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First,  they  embody  many  of  the  steps  that  are  commonly 
thought  to  produce  good  ratings.  Second,  they  should 
produce  controlled  rather  than  automatic  processing 
of  information  (Feldman,  1981).  Third,  if  the  tra¬ 
ditional  model  is  correct,  such  conditions  should 
minimize  the  occurrence  of  halo,  leading  to  differen¬ 
tial  predictions  for  the  traditional  as  compared  to 
the  categorization  model  (p.  105). 


Subjects  were  then  shown  the  two  videotaped  lectures,  however,  the 
tapes  were  presented  one  day  apart  and  during  the  final,  third  day 
session  they  were  given  the  seven -point  scales  on  which  their  ratings 
of  the  instructor  were  made. 

Independent  variable.  As  previously  mentioned,  the  single  inde¬ 
pendent  variable  was  the  order  of  videotape  presentation.  One  group 
of  subjects  viewed  the  tapes  in  descending  performance  order.  That 
is,  they  viewed  the  favorable  tape  during  the  first  session  and  the 
unfavorable  tape  during  the  second  session.  The  order  of  presentation 
was  reversed  for  the  remaining  group.  By  showing  the  tapes  with  a  one 
day  temporal  delay  and  manipulating  the  order  of  presentation,  it 
was  believed  that  a  more  realistic  replication  of  the  kind  of  obser¬ 
vations  made  in  the  business  world  would  be  reflected.  Typically, 
performance  appraisers  do  not  observe  individuals  always  behaving 
consistently  throughout  the  entire  rating  period. 

Dependent  variable.  After  viewing  both  tapes,  participants 
assessed  the  instructor's  overall  performance  as  well  as  her  specific 
performance  in  each  of  the  four  dimensions.  These  judgments  were 
measured  on  seven-point  Likert  rating  scales.  Three 

ratings  questioning  the  lecturer's  preparation  and  presentation  of 
the  lecture  material  were  combined  to  form  the  organization  scale. 
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Three  questions  pertaining  to  the  lecturer's  speaking  and  writing 
abilities  as  well  as  the  use  of  visual  aids  were  combined  to  form  a 

measure  of  the  lecturer's  delivery.  A  relevance  scale  was  developed 
by  combining  three  questions  covering  the  lecturer's  choice  of  meaning¬ 
ful  and  Interesting  examples.  Finally,  a  measure  of  the  lecturer's 
depth  of  knowledge  was  formed  by  combining  three  questions  relating 
to  the  lecturer's  mastery  of  the  subject  matter  and  understanding  of 
the  literature  and  research  reported.  The  reliabilities  of  the 
organization,  delivery,  relevance,  and  depth  of  knowledge  scales  were 
computed,  revealing  a  coefficient  alpha  equal  to  .79,  .53,  .75,  and 
.78  respectively. 

Additionally,  subjects  were  presented  with  a  recognition  memory 
test  consisting  of  a  list  of  32  critical  incidents. 

Of  these  incidents,  eight  were  selected  from  the  favorable  tape,  eight 
from  the  unfavorable  tape,  and  the  remaining  16  incidents  were  not 
present  in  either  videotape.  Subjects  indicated  whether  or  not  each 
incident  appeared  in  either  tape.  In  this  way  any  differences  in  the 
memory  capabilities  of  each  experimental  group  could  be  detected. 
Furthermore,  the  model  that  can  best  explain  the  rater’s  cognitive 
processes  should  be  supported,  in  part,  by  the  responses  on  the 
recognition  measure.  Borman's  model  would  suggest  that  raters 
accurately  recall  the  behaviors  exhibited  by  the  instructor.  However, 
Feldman's  model  would  predict  less  accurate  recognitions.  More 
specifically,  the  rater's  recall  would  be  based  on  the  schema  activated 
after  viewing  the  first  videotape  rather  than  on  each  specific  behavior 


exhibited. 
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Manipulation  check.  To  ensure  the  efficacy  of  the  Independent 
variable,  subjects  responded  to  a  brief  post-experimental  questionnaire- 
This  measured  the  extent  to  which  participants 
viewed  each  tape  as  a  favorable  or  unfavorable  demonstration  of  lecture 
performance. 


RESULTS 


The  results  of  this  study  are  presented  in  three  sections.  These 
Include  the  manipulation  check,  the  performance  ratings,  and  a  memory 
recognition  analysis. 

Manipulation  Check 

In  order  to  assess  the  extent  to  which  the  experimental  manipula¬ 
tion  was  effective,  respondents  completed  three  questions:  (1)  spe¬ 
cifically  In  the  first  videotape  viewed,  how  would  you  evaluate  the 
lecturer's  performance,  (2)  specifically  In  the  second  videotape 
viewed,  how  would  you  evaluate  the  lecturer's  performance,  and  (3)  con¬ 
sidering  both  videotapes  viewed.  In  which  one  was  the  lecturer's  per¬ 
formance  most  favorable.  Responses  to  the  first  two  questions  were 
collected  on  seven-point  rating  scales.  The  third  question  required 
subjects  to  check  either  the  first  videotape  viewed  or  the  second 
(dummy  coded  a  1  and  2  respectively).  Summary  descriptive  statistics 
along  with  the  results  of  a  one-way  analysis  of  variance  for  each  of 
the  three  manipulation  check  items  are  presented  in  Table  2.  In 
reviewing  the  results,  it  is  necessary  to  remember  that  participants 
in  group  1  viewed  the  favorable  lecture  first,  followed  by  the  presenta¬ 
tion  of  the  unfavorable  lecture  during  the  second  experimental  session. 
The  order  of  tape  presentation  was  reversed  for  group  2. 

The  analysis  of  variance  for  the  first  item  indicates  a  signifi¬ 
cant  difference  in  the  groups'  perceptions  of  the  first  videotape. 


Table  2 


£(1,52)  ■  126.92,  £<  .0001.  This  difference  was  also  found  in  the 
perceptions  of  the  second  videotape  viewed  by  each  group,  £(1,52)  = 
46.99,  £<  .0001. 

In  order  to  correctly  and  fully  test  the  competing  hypotheses 
it  was  necessary  for  participants  to  perceive  both  a  favorable  and 
an  unfavorable  lecture  performance.  The  manipulation  check  indicates 
that  this  requisite  was  not  met.  Most  significant  is  the  finding  that 
the  lecturer's  performance  in  the  “favorable1'  tape  was  perceived  as 
average.  Evidently,  the  performance  ratings  provided  by  participants 
in  the  pretest  experiment  did  not  coincide  with  the  perceptions  of 
the  respondents  in  this  phase.  Unfortunately,  this  result  must  color 
the  interpretation  of  the  data  reported  subsequently.  Any  hypotheses 
testing  that  relies  on  the  presence  of  a  favorable  and  unfavorable 
tape  (rather  than  an  average  and  an  unfavorable  tape)  may  be  uninter¬ 
pretable  or,  at  least,  inappropriate.  However,  tests  of  hypotheses 
that  are  merely  dependent  on  a  differentiation  of  the  tapes'  favor- 
abilities  may  be  salvageable. 

The  analysis  of  the  third  manipulation  check  item  revealed  that 
participants  easily  identified  which  tape  reflected  the  more  favorable 
lecture  performance,  F(l,52)  *  312.50,  p.  <  .0001.  A  mean  of  1.0  was 
expected  for  group  1,  indicating  that  of  the  two  tapes  viewed,  the 
first  was  perceived  as  more  favorable.  On  the  other  hand,  a  mean  of 
2.0  was  expected  for  group  2,  indicating  that  the  second  tape  viewed 
was  selected  as  the  better  performance.  The  actual  means  for  groups  1 
and  2  were  1.04  and  1.96  respectively. 

Also  included  in  the  post-experimental  questionnaire  were  three 


Item  designed  to  assess  the  participants'  (1)  prior  experience  in 
evaluating  an  Instructor's  performance,  (2)  knowledge  of  the  lecture 
topic  (communication  in  organizations),  and  (3)  familiarity  with  the 
female  lecturer  in  the  tapes.  Responses  (all  coded  yes  «  1,  no  =  2) 
indicated  that  there  were  no  significant  differences  in  (1)  the  groups 
prior  experience  with  evaluating  an  instructor's  performance,  £(1,52) 
1.17,  £  >  .28,  with  means  of  1.56  and  1.41  for  groups  1  and  2  respec¬ 
tively,  or  (2)  their  knowledge  of  the  lecture  topic,  £(1,52)  *  0.11, 

£  >  .74  with  means  of  1.82  and  1.78  respectively.  Additionally,  no 
participants  In  either  group  Indicated  that  they  had  prior  familiarity 
with  the  lecturer  In  the  videotapes. 

Performance  Ratings 


Rating  scales.  Before  examining  the  actual  performance  ratings 
a  preliminary  check  was  conducted  to  determine  the  degree  to  which 
the  four  dimensional  scales  were  intercorrelated.  Table  3  presents 
these  results.  By  examining  the  correlations,  the  degree  of  halo 
within  the  ratings  can  be  assessed.  Higher  correlations  indicate 
less  rater  discrimination  of  the  lecturer's  behaviors  and  thus  more 
halo  (Saal,  Downey,  &  Lahey,  1980).  Overall,  halo  appears  to  be 
prominent  in  the  ratings.  The  correlation  between  the  organization 
and  delivery  dimensions  (r  =  .75,  £  <  .001)  is  particularly  high. 

As  Feldman's  model  predicts,  there  was  a  reliance  on  global  impres¬ 
sions.  The  presence  of  halo  contradicts  Borman's  prediction  that  sub 
jects  would  differentiate  performance  behaviors  rather  than  deperd 
on  vague  abstractions. 
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As  previously  stated,  if  Borman's  model  Is  correct  the  perfor¬ 
mance  ratings  should  reflect  the  mean  of  the  instructor's  behaviors. 
There  should  not  be  any  significant  differences  in  the  ratings  pro¬ 
duced  by  the  two  experimental  groups.  However,  if  Feldman's  model 
is  supported,  the  ratings  should  mirror  the  videotape  initially  viewed 
by  the  ratee.  Accordingly,  significant  differences  would  be  expected 
in  the  ratings  of  the  two  groups.  The  means  and  standard  deviations 
for  the  four  specific  scale  ratings  are  presented  in  Table  4.  This 
analysis  shows  that  each  group  provided  similar  mean  dimensional 
ratings. 

Furthermore,  the  effects  of  the  experimental  manipulation  were 
assessed  through  a  multivariate  analysis  of  variance  with  the  four 
performance  scales  combined  to  serve  as  the  dependent  variable, 
while  the  order  of  tape  presentation  served  as  the  independent 
variable.  Wilks'  lambda  revealed  an  Flapprox)  *  .40,  d£  -  4,49,  £ 

>  .80.  Clearly,  across  all  dependent  variables  there  were  no  sig¬ 
nificant  differences  in  the  ratings  of  the  two  experimental  groups. 
With  these  results,  statistical  guidelines  did  not  permit  a  subse¬ 
quent  series  of  univariate,  one-way  analyses  of  variance  with  each 
performance  scale  serving  as  a  separate  dependent  variable. 

As  previously  stated,  the  reliability  of  the  delivery  dimensional 
scale  was  .53,  which  was  particularly  low.  Therefore,  each  item 
comprising  the  delivery  scale  was  analyzed  separately.  The  analysis 
of  variance,  however,  did  not  reveal  any  significant  differences. 

Finally,  on  a  seven-point  rating  scale,  subjects  were  asked  to 
evaluate  the  lecturer's  overall  performance  across  both  videotapes. 


These  responses  served  as  the  dependent  variable  In  a  one-way  analyst's 
of  variance  while  the  two  experimental  groups  served  as  the  Independent 
variable.  The  analysis  revealed  no  significant  differences  in  the 
overall  performance  ratings,  £(1,52)  =  .02,  £  >  .88.  The  mean  overall 
rating  for  group  1  was  2.56  while  for  group  2  the  mean  was  2.59. 

These  ratings  are  nearly  an  exact  average  of  the  groups’  perceptions 
of  the  two  tapes  separately  (see  Table  2,  p.  29).  This  result  par¬ 
tially  supports  Borman's  predictions,  in  that  subjects  appeared  to 
base  their  overall  rating  on  an  average  of  the  individual  tape  ratings 
rather  than  rely  on  the  schema  activated  by  the  initial  tape  viewed, 
as  predicted  by  Feldman.  However,  due  to  the  perceptions  of  the 
"favorable"  tape  as  only  average,  Feldman's  predictions  could  not  be 
appropriately  tested. 

Memory  Recognition 


Memory  check.  To  determine  whether  there  were  any  differences 
in  the  memory  capacities  of  the  two  experimental  groups,  a  correlated 
£-test  was  computed.  This  analysis  examined  the  subjects'  ability 
to  accurately  distinguish  the  behavioral  items  that  were  present  from 
those  that  were  absent  in  the  videotapes.  Due  to  the  random  assign¬ 
ment  of  subjects  to  experimental  conditions,  differences  in  memory 
were  neither  expected  nor  found,  £(53)  *  .36,  £  >  .72. 

Design.  The  32  behavioral  items  were  grouped  according  to 
their  (a)  association  with  the  favorable  or  unfavorable  tape,  (b) 
presence  or  absence  in  either  videotape,  and  (c)  exemplification  of 
good  or  bad  lecturer  behavior.  The  design  for  this  part  of  the 


analyses  was  therefore  a  2  (order  of  tape  presentation)  X  2  (favorable/ 
unfavorable  videotape)  X  2  (presence/absence  of  item)  X  2  (good/bad 
behavior).  Due  to  perceptions  of  the  favorable  tape  as  merely  average, 
this  design  was  subsequently  altered  to  a  2  (tape  order)  X  2  (presence/ 
absence)  X  2  (good/bad)  analysis  with  one  between  subjects  factor 
and  two  within  subjects  factors. 

Item  analyses.  The  means  and  cell  deviations  associated  with 
the  significant  findings  of  an  analysis  of  variance  are  presented  in 
Table  5.  Additionally,  all  results  of  this  repeated  measures  analysis 
of  variance  are  presented  in  Table  6.  Each  recognition  item  was 
coded  zero  if  the  response  was  inaccurate.  Otherwise,  if  the  subject 
accurately  identified  the  item,  it  was  given  a  code  of  one.  The 
grand  mean  across  all  memory  items  was  .66  which  is  not  particularly 
high.  This  finding  does  not  support  Borman's  contention  that  raters 
remember  the  specific  behaviors  exhibited  by  the  ratee. 

Table  6  reveals  a  marginally  significant  main  effect  for  the 
order  of  tape  presentation.  This  finding,  coupled  with  the  signifi¬ 
cant  interaction  of  tape  order  by  presence/absence  of  item,  indicates 
a  probable  difference  in  the  memory  abilities  of  the  two  groups. 

More  specifically,  the  means  in  Table  5  show  that  both  groups 
were  equivalently  accurate  in  recalling  the  present  items,  however, 
there  was  a  large  difference  (.60  for  group  1  and  .72  for  group  2) 
in  their  abilities  to  accurately  recognize  items  that  were  not  present 
in  either  tape. 

The  most  significant  result  was  the  interaction  between  the 
presence/absence  of  items  and  their  exemplification  of  a  good/bad 


Table  5 


Means  and  Cell  Deviations  for  Recognition 
Memory  Items  (Significant  Results  Only) 


M 

SD 

Tape  Order 

Group  1 

.64 

.20 

Group  2 

.69 

.17 

Presence/Absence 

Tape  Order 

of  Items 

Group  1 

Group  2 

Present  Items 


M 

.68 

.66 

lb 

.19 

.15 

Absent  Items 

M 

.60 

.72 

lb 

.21 

.20 

Behavior 

Presence/Absence 

of  Items 

Good 

Bad 

Present  Items 


M 

.63 

.71 

SO 

.18 

.16 

Absent  Items 

M 

.72 

.60 

SO 

.17 

.25 

Note.  Higher  mean  values  indicate  more  accurate 
recognition. 


behavior,  F(l,52)  *  14.46,  £  <  .0005.  The  means  indicate  that  sub¬ 
jects  more  accurately  recalled  items  that  were  present  in  the  tapes 
and  examples  of  bad  behaviors  than  they  recognized  items  that  were 
absent  and  good.  This  Interaction  lends  support  to  Feldman's  predic¬ 
tion  that  subjects  rely  on  schemas  rather  than  on  specific  behaviors 
to  evaluate  performance.  The  schema  in  this  case  would  reflect  an 
instructor  exhibiting  predominantly  poor  performance  across  both 
videotapes,  as  indicated  in  the  overall  performance  ratings.  There¬ 
fore,  cognitive  categorization  predicts  that  responses  should  reflect 
a  global  recognition  of  items  exemplifying  poor  behaviors  and  a  global 
lack  of  recognition  of  items  exemplifying  favorable  performance. 

This  pattern  is  reflected  in  the  lower  recognition  accuracy  for  good/ 
present  items  and  bad/absent  items  and  at  the  same  time  a  higher 
accuracy  for  recognizing  good/absent  items  and  bad/present  items. 
Subjects  more  accurately  remembered  behaviors  that  reinforced  their 
schema  of  an  incompetent  instructor  and  failed  to  accurately  recall 
the  behaviors  that  were  inconsistent  with  the  schema. 


oiscussion 


Performance  ratings  can  have  a  pervasive  Impact  on  many  facets 
of  the  organization.  Gaining  a  better  understanding  of  the  rater's 
Implicit  cognitive  processes  was  the  primary  aim  of  this  study. 
Specifically,  it  was  designed  to  assess  which  of  two  theoretical 
models  most  adequately  explains  these  cognitive  operations.  The 
original  research  question  focused  on  two  competing  hypotheses. 
Assuming  the  superiority  of  one  cognitive  model  over  the  other,  each 
hypothesis  made  corresponding  predictions  of  the  performance  appraisal 
outcomes.  Based  on  the  analytical  results,  it  seems  that  neither 
the  traditional  nor  categorization  model  alone  can  completely  account 
for  the  cognitions  of  the  subjects  involved  in  the  present  experiment. 
Both  models  are  partially  supported  and  at  the  same  contradicted. 

A  brief  review  of  the  results  indicates  support  for  Borman's 
traditional  model  in  two  fundamental  areas,  both  of  which  directly 
relate  to  the  performance  ratings.  First,  there  were  no  significant 
differences  in  the  groups'  ratings  on  the  four  dimensional  scales. 
Secondly,  evaluations  of  the  lecturer's  overall  performance  did  not 
significantly  differ  across  the  groups.  Instead  these  ratings 
centered  around  the  mean  of  all  the  lecturer's  behaviors.  According 
to  Borman,  there  should  not  be  differences  in  the  scale  and  overall 
ratings  because  raters  accurately  remember  specific  dimensional  be¬ 
haviors.  Based  on  the  findings  reported  in  the  following  section, 
this  assumption  appears  to  be  erroneous. 


The  cognitive  categorization  model  also  finds  partial  support 
In  the  data.  Unlike  the  traditional  model,  however,  this  support 
does  not  generally  stem  from  the  performance  ratings  but  rather  is 
embedded  in  the  results  of  the  recognition  memory  task.  In  contrast 
to  Borman's  predictions,  the  analysis  revealed  an  overall  inability 
of  raters  to  accurately  recall  the  behavioral  items.  This  is  incon¬ 
sistent  with  the  accuracy  observed  in  the  performance  ratings.  Sub¬ 
jects  were  able  to  accurately  rate  the  instructor's  performance 
across  both  videotapes,  as  witnessed  in  the  mean  overall  ratings. 
However,  the  recognition  task  revealed  that  subjects  did  not  remember 
the  specific  behaviors  that  collectively  produced  the  overall  perfor¬ 
mance.  Feldman's  model  predicts  that  recall  should  be  based  on  the 
schema  initially  activated  rather  than  on  specific  lecturer  behaviors. 
Contrary  to  experimental  intentions,  the  schema  activated  by  viewing 
both  videotapes  was  predominantly  of  an  unfavorable  lecturer.  There¬ 
fore,  the  categorization  model  contends  that  raters'  recognition 
scores  should  indicate  a  global  recall  of  items  reflecting  unfavorable 
lecture  behaviors.  In  completing  the  recognition  task,  subjects 
should  agree  to  seeing  more  unfavorable  than  favorable  behaviors  due 
to  reliance  on  their  schemas.  This  should  generally  occur  across 
all  recognition  items  irrespective  of  their  actual  presence  or  absence 
in  the  videotapes.  In  effect,  saying  "yes"  to  the  unfavorable  items 
leads  to  higher  accuracy  scores  for  those  items  that  were  actually 
present  but  lower  accuracy  for  absent  items.  Conversely,  reliance 
on  the  unfavorable  lecturer  schema  leads  to  negative  responses  to 
favorable  items,  thereby  producing  higher  accuracy  for  absent  favorabl 


Items  and  lower  accuracy  for  present  items.  This  pattern  is  clearly 
revealed  in  the  memory  recognition  scores. 

The  halo  present  in  the  ratings  further  supports  categorization's 
prediction  that  raters  classify  people  into  global  categories.  This 
general  and  unfavorable  impression  of  the  lecturer  thereby  produced 
a  large  halo  effect. 

Unfortunately,  a  complete  test  of  the  competing  hypotheses 
could  not  be  conducted  due  to  perceptions  of  the  favorable  tape  as 
merely  average.  In  an  attempt  to  mimic  real  world  performance,  each 
videotape  contained  both  favorable  and  unfavorable  behaviors.  This 
may  have  precluded  raters  from  perceiving  the  favorable  tape  more 
positively.  Perhaps  for  exploratory  purposes  there  needs  to  be 
a  greater  disparity  in  the  good  and  bad  behavioral  examples.  Fur¬ 
thermore,  there  may  have  been  an  inherent  problem  in  the  use  of 
videotapes.  This  is  not  characteristic  of  raters  in  the  organization 
where  there  is  opportunity  for  dynamic  interaction  with  the  ratee. 

The  sterile  tapes  provided  no  such  interchange.  One  potential  solu¬ 
tion  to  this  problem  is  the  addition  of  a  fifth  performance  dimension 
in  the  videotapes,  namely,  interpersonal  relations  with  students 
(Hararl  &  Zedeck,  1973).  A  more  favorable  and  holistic  impression 
may  develop  by  viewing  an  interaction  with  others. 

How  does  one  account  for  the  general  finding  that  Borman’s  model 
Is  most  applicable  when  explaining  the  performance  ratings  while 
Feldman's  model  best  explains  the  results  of  the  recognition  memory 
task?  Perhaps  raters  utilize  different  cognitive  strategies  at  dif¬ 
ferent  levels  of  appraisal.  Fiske  and  Taylor  (1984)  outline  four 


levels  of  specificity  that  may  create  variations  in  memory  accuracy. 
Recall  can  be  more  or  less  accurate  about  (1)  people  in  general,  (2) 
specific  people,  (3)  specific  attributes,  or  (4)  specific  attributes 
for  specific  people.  Participants  in  the  present  study  were  asked  to 
make  general  performance  ratings  that  merely  required  reliance  on  a 
global  impression.  They  were  also  requested  to  recall  specific  be¬ 
havioral  items,  thus  requiring  a  more  accurate  encoding  process  and 
efficient  search  of  memory.  In  terms  of  the  four  levels  of  Fiske 
and  Taylor  (1984),  the  performance  ratings  depend  on  raters'  accuracy 
about  specific  people,  in  this  case  one  specific  lecturer.  On  the 
other  hand,  accuracy  on  the  recognition  items  relies  on  memory  about 
specific  attributes  for  specific  people. 

Fiske  and  Tyalor  (1984)  also  argue  that  efficiency  and  accuracy 
are  often  forfeited  for  each  other.  This  may  explain  the  low  levels 
of  accuracy  found  in  the  recognition  items  in  this  study.  It  is  not 
efficient  for  raters  to  categorize  information  at  a  specific  behavioral 
level  when  they  can  be  relatively  accurate  on  performance  ratings  by 
simply  depending  on  a  more  general,  summary  evaluation  of  the  target 
stimulus.  Thus,  in  order  for  subjects  to  efficiently  store  and  access 
information  for  the  performance  ratings,  some  accuracy  was  sacrificed, 
as  evidenced  in  the  recognition  task. 

In  a  related  vein.  Lord  (1985)  suggests  that  there  are  three 
approaches  to  defining  accuracy  in  behavioral  measurement.  These 
include  behavioral  accuracy,  classification  accuracy,  and  differences 
in  rater  decision  criteria.  The  first  two  approaches  are  of  particular 
interest  to  the  findings  of  the  current  research  project.  According 


to  Lord  (1985),  high  behavioral  accuracy  occurs  when  raters  process 
behavioral  information  effectively  and,  at  the  same  time,  distinguish 
this  Information  from  behaviors  that  were  not  observed  but  appear 
plausible  based  on  knowledge  about  the  ratee.  This  definition  corre¬ 
sponds  to  the  type  of  accuracy  required  to  correctly  respond  to  the 
behavioral  recognition  items  in  the  present  study. 

Classification  accuracy  refers  to  the  categorization  of  infor¬ 
mation  based  on  a  global  Impression  of  the  ratee.  The  target  stimu¬ 
lus  is  simplified  and  there  Is  merely  an  overall  classification  of 
information.  When  completing  the  performance  ratings,  participants 
in  this  study  only  had  to  rely  on  classification  accuracy.  This 
approach  enabled  subjects  to  combine  efficiency  and  accuracy  to  their 
greatest  advantage. 

The  findings  of  this  project  are  generally  consistent  with 
the  results  of  the  experiment  conducted  by  Nathan  and  Lord  (1983). 

They  found  that  predictions  of  the  traditional  model  were  upheld  in 
their  performance  ratings.  However,  the  existence  of  halo  in  the 
ratings,  and  errors  in  recognizing  specific  behavioral  incidents  lent 
support  to  categorization.  Together,  these  studies  suggest  that 
future  research  is  needed  to  uncover  the  general izability  of  the 
results  to  the  organization.  Both  experiments  presented  videotaped 
stimuli  to  subjects  in  a  controlled  environment.  Performance 
appraisals,  however,  are  often  made  under  more  complex,  confounded, 
and  ambiguous  circumstances. 

Additionally,  the  ability  of  these  findings  to  generalize  is 
inhibited  by  the  rse  of  upward  rather  than  downward  appraisals. 


Research  needs  to  focus  on  supervisors'  ratings  of  subordinates  as 
that  is  the  normal  organizational  procedure. 

Ri  Teh  is  also  needed  that  would  span  a  longer  time  period. 
Although  this  project  examined  cognitive  processes  across  five  days, 
this  is  a  minimal  demand  on  memory  In  comparison  to  the  yearly 
appraisals  that  are  common  in  organizations.  In  this  way,  the  effect 
of  performance  patterns  can  be  demonstrated.  Murphy,  Balzer,  Lockhart, 
and  Eisenman  (1985)  recently  explored  the  impact  of  prior  performance 
on  evaluations  of  present  performance.  In  part  they  found  that 
deviations  from  previous  performance  patterns  can  intensify  the  dif¬ 
ficulty  of  making  accurate  appraisals.  A  strong  contrast  effect 
was  observed  in  both  the  performance  evaluations  and  the  ratings  of 
the  frequency  of  numerous  critical  behaviors.  Unfortunately,  they 
were  unable  to  adequately  explain  the  memory  processes  Involved, 
in  spite  of  conditions  that  minimized  memory  requirements.  A  synthe¬ 
sis  of  this  focus  on  performance  patterns  together  with  a  longitudinal 
cognitive  emphasis  should  reveal  unique  applications  to  appraisals 
in  organizational  settings. 

Finally,  performance  appraisal  models  encompassing  the  complex 
cognitive  processes  of  raters  are  needed  to  generate  and  advance 
research.  In  this  respect,  the  model  and  research  propositions 
developed  by  DeNisi,  Cafferty,  and  Meglino  (1984)  provide  a  promising 
foundation  on  which  future  appraisal  research  can  be  based. 
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